Sentence disambiguation by document oriented preference sets
نویسندگان
چکیده
2. The concept of D o P S 1. I n t r o d u c t i o n Ambiguity in sentence interpretation is a major problem in natural language processing(NLP). Conventional NLt' systems often use ad hoc or extremely large knowledgebases (pragmatic / semantic / commonsense) to eliminate ambiguities. Such syslems are too slow and sometimes provide iacomplete analyses. They have the further handicap lhat very large knowledgebases are t~eeded. Asking the user for confirmation [Nishida 1982] is a practical solution to get correct parse-trees, but this confirmation is ~ot useful l'or further computations. A practical NLP system should produce accurate results automatically while using a simple method and simple knowledge. Preference models [Petitpierre 1987, Fass 1983, Schubert 1984], such as preference semantics, scoring, and syntactic preference are good candidates for a practical NLP system, because these models utilize simple readymade knowledge like semantic markers or case frame dictionaries. The most difficult problem with preference models is the selection of the most appropriate preference knowledge that will induce a correct interpretation. However, preference knowledge extracted from a large corpus or an on-line dictionary [Jensen 1987] induces preference knowledge conflicts which block complete disambiguation. Syntact ic rules are capable of producing many sentence parse -trees. These parse-trees are syntactically correct, but most are incorrect from the view points of semantic meaning, contextual meaning, common-sense, specific field knowledge. It is necessary to use appropriate knowledge (semantic / contextual / commonsense / specific field) to eliminate the incorrect interpretations. For example, consider passage 1 of Figure 1. There are two possible interpretations for the gerund-phrase attachment. (1) The power supply(~u-it,b for charging ~ t t ~ ravine a volta~e-temr~erature coefficient .... ... (Passage 1;begining of target sentence) the voltage-temperatm'e coefficient of being charged .... !.. ~1' (Passage 2;part of target sentence)
منابع مشابه
Periods, Capitalized Words, etc
In this article we present an approach for tackling three important aspects of text normalization: sentence boundary disambiguation, disambiguation of capitalized words in positions where capitalization is expected, and identification of abbreviations. As opposed to the two dominant techniques of computing statistics or writing specialized grammars, our document-centered approach works by consi...
متن کاملA hybrid approach for urdu sentence boundary disambiguation
Sentence boundary identification is a preliminary step for preparing a text document for Natural Language Processing tasks, e.g., machine translation, POS tagging, text summarization and etc. We present a hybrid approach for Urdu sentence boundary disambiguation comprising of unigram statistical model and rule based algorithm. After implementing this approach, we obtained 99.48% precision, 86.3...
متن کاملA preference learning approach to sentence ordering for multi-document summarization
Ordering information is a difficult but an important task for applications generating naturallanguage texts such as multi-document summarization, question answering, and conceptto-text generation. In multi-document summarization, information is selected from a set of source documents. Therefore, the optimal ordering of those selected pieces of information to create a coherent summary is not obv...
متن کاملKnowledge-based Word Sense Disambiguation using Topic Models
Word Sense Disambiguation is an open problem in Natural Language Processing which is particularly challenging and useful in the unsupervised setting where all the words in any given text need to be disambiguated without using any labeled data. Typically WSD systems use the sentence or a small window of words around the target word as the context for disambiguation because their computational co...
متن کاملDisambiguation of Super Parts of Speech ( or Supertags ) : Almost
In a lexicalized grammar formalism such as Lexicalized Tree-Adjoining Grammar (LTAG), each lexical item is associated with at least one elementary structure (supertag) that localizes syntactic and semantic dependencies. Thus a parser for a lexicalized grammar must search a large set of supertags to choose the right ones to combine for the parse of the sentence. We present techniques for disambi...
متن کامل